Access Map Pattern Matching for High Performance Data Cache Prefetch

نویسندگان

  • Yasuo Ishii
  • Mary Inaba
  • Kei Hiraki
چکیده

Hardware data prefetching is widely adopted to hide long memory latency. A hardware data prefetcher predicts the memory address that will be accessed in the near future and fetches the data at the predicted address into the cache memory in advance. To detect memory access patterns such as a constant stride, most existing prefetchers use differences between addresses in a sequence of memory accesses. However, prefetching based on the differences often fail to detect memory access patterns when aggressive optimizations are applied. For example, out-of-order execution changes the memory access order. It causes inaccurate prediction because the sequence of memory addresses used to calculate the difference are changed by the optimization. To overcome the problems of existing prefetchers, we propose Access Map Pattern Matching (AMPM). The AMPM prefetcher has two key components: a memory access map and hardware pattern matching logic. The memory access map is a bitmap-like data structure for holding past memory accesses. The AMPM prefetcher divides the memory address space into memory regions of a fixed size. The memory access map is mapped to the memory region. Each entry in the bitmap-like data structure is mapped to each cache line in the region. Once the bitmap is mapped to the memory region, the entry records whether the corresponding line has already been accessed or not. The AMPM prefetcher detects memory access patterns from the bitmap-like data structure that is mapped to the accessed region. The hardware pattern matching logic is used to detect stride access patterns in the memory access map. The result of pattern matching is affected by neither the memory access order nor the instruction addresses because the bitmap-like data structure holds neither the information that reveals the memory access order of past memory accesses nor the instruction addresses. Therefore, the AMPM prefetcher achieves high performance even when such aggressive optimizations are applied. The AMPM prefetcher is evaluated by performing cycle-accurate simulations using the memory-intensive benchmarks in the SPEC CPU2006 and the NAS Parallel Benchmark. In an aggressively optimized environment, the AMPM prefetcher improves prefetch coverage, while the other state-of-the-art prefetcher degrades the prefetch coverage significantly. As a result, the AMPM prefetcher increases IPC by 32.4% compared to state-of-the-art prefetcher. Ishii, Inaba, & Hiraki

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Approach to Data Prefetching Using 2-Dimensional Selection Criteria

We propose an approach to data memory prefetching which augments the standard prefetch buffer with selection criteria based on performance and usage pattern of a given instruction. This approach is built on top of a pattern matching based prefetcher, specifically one which can choose between a stream, a stride, or a stream followed by a stride. We track the most recently called instructions to ...

متن کامل

Distributed Prefetch-bu er/Cache Design for High Performance Memory Systems

Microprocessor execution speeds are improving at a rate of 50%-80% per year while DRAM access times are improving at a much lower rate of 5%-10% per year. Computer systems are rapidly approaching the point at which overall system performance is determined not by the speed of the CPU but by the memory system speed. We present a high performance memory system architecture that overcomes the growi...

متن کامل

Neural Network based Mobility aware Prefetch Caching and Replacement Strategies in Mobile Environment

The Location Based Services (LBS) have ushered the way mobile applications access and manage Mobile Database System (MDS). Caching frequently accessed data into the mobile database environment, is an effective technique to improve the MDS performance. The cache size limitation enforces an optimized cache replacement algorithm to find a suitable subset of items for eviction from the cache. In wi...

متن کامل

Non-Referenced Prefetch(NRP) Cache for Instruction Prefetching

A new conceptual cache, NRP (Non-Referenced Prefetch) cache, is proposed to improve the performance of instruction prefetch mechanisms which try to prefetch both the sequential and non-sequential blocks under the limited memory bandwidth. The NRP cache is used in storing prefetched blocks which were not referenced by the CPU, while these blocks were discarded in other previous prefetch mechanis...

متن کامل

A Smart Cache for Improved Vector Performance

As the speed of microprocessors increases at a breath-taking rate, the gap between processor and memory system performance is getting worse. To alleviate this problem, all modern processors contain caches, but even using caches, processors cannot achieve their peak performance. We propose a mechanism, smart caching, which extends the power of conventional memory subsystems by including a prefet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Instruction-Level Parallelism

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2011